Clustering Text Data Using Text ART Neural Network
نویسندگان
چکیده
Most studies of data mining have focus on structured data such as relational, transactional, and data warehouse data. However, the most available information is stored in text database, which consist of large amounts of text documents such as news articles, research papers, and e-mail messages. Data stored in most text databases are unstructured data, such as abstract and contents. The ability to deal with different types of attributes is a typical requirement of clustering in data mining. Thus, mining unstructured data has become an increasingly important task in text mining. The main contribution of this paper is to cluster on a data set, which has a non-numerical feature value. Unlike the conventional clustering algorithms such as the K-Means algorithm, which forms clusters in numerical values domains, a Text ART Neural Network works directly on textual information without text transformation into a numerical value. The experimental results are represented that conducted on 2 datasets. The first dataset is a Synthesized Text Document and the second dataset is a Reuter-21578 Distribution 1.0. The F-Measure equation use to measure the effectiveness of the proposed technique. According to the experimental results, the proposed neural network has well performance in clustering text data that the F-Measure of the experiment is 95.56% and 83.31% respectively. Key-Words: Text Mining, Document Clustering, Unsupervised Learning, Artificial Neural Networks
منابع مشابه
Evaluating Quality of Text Clustering with ART1
Self-organizing large amounts of textual data in accordance to some topics structure is an increasingly important application of clustering. Adaptive Resonance Theory (ART) neural networks possess several interesting properties that make them appealing in this area. Although ART has been used in several research works as a text clustering tool, the level of quality of the resulting document clu...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملLearning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering
In this paper, we propose a novel end-to-end neural architecture for ranking candidate answers, that adapts a hierarchical recurrent neural network and a latent topic clustering module. With our proposed model, a text is encoded to a vector representation from an wordlevel to a chunk-level to effectively capture the entire meaning. In particular, by adapting the hierarchical structure, our mode...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملOn the quality of ART1 text clustering
There is a large and continually growing quantity of electronic text available, which contain essential human and organization knowledge. An important research endeavor is to study and develop better ways to access this knowledge. Text clustering is a popular approach to automatically organize textual document collections by topics to help users find the information they need. Adaptive Resonanc...
متن کامل